228 research outputs found
Conic Multi-Task Classification
Traditionally, Multi-task Learning (MTL) models optimize the average of
task-related objective functions, which is an intuitive approach and which we
will be referring to as Average MTL. However, a more general framework,
referred to as Conic MTL, can be formulated by considering conic combinations
of the objective functions instead; in this framework, Average MTL arises as a
special case, when all combination coefficients equal 1. Although the advantage
of Conic MTL over Average MTL has been shown experimentally in previous works,
no theoretical justification has been provided to date. In this paper, we
derive a generalization bound for the Conic MTL method, and demonstrate that
the tightest bound is not necessarily achieved, when all combination
coefficients equal 1; hence, Average MTL may not always be the optimal choice,
and it is important to consider Conic MTL. As a byproduct of the generalization
bound, it also theoretically explains the good experimental results of previous
relevant works. Finally, we propose a new Conic MTL model, whose conic
combination coefficients minimize the generalization bound, instead of choosing
them heuristically as has been done in previous methods. The rationale and
advantage of our model is demonstrated and verified via a series of experiments
by comparing with several other methods.Comment: Accepted by European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECMLPKDD)-201
Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport
This work addresses the problem of domain adaptation on an unlabeled target dataset using knowledge from multiple labelled source datasets. Most current approaches tackle this problem by searching for an embedding that is invariant across source and target domains, which corresponds to searching for a universal classifier that works well on all domains. In this paper, we address this problem from a new perspective: instead of crushing diversity of the source distributions, we exploit it to adapt better to the target distribution. Our method, named Multi-Source Domain Adaptation via Weighted Joint Distribution Optimal Transport (MSDA-WJDOT), aims at finding simultaneously an Optimal Transport-based alignment between the source and target distributions and a re-weighting of the sources distributions. We discuss the theoretical aspects of the method and propose a conceptually simple algorithm. Numerical experiments indicate that the proposed method achieves state-of-the-art performance on simulated and real datasets
Relative Comparison Kernel Learning with Auxiliary Kernels
In this work we consider the problem of learning a positive semidefinite
kernel matrix from relative comparisons of the form: "object A is more similar
to object B than it is to C", where comparisons are given by humans. Existing
solutions to this problem assume many comparisons are provided to learn a high
quality kernel. However, this can be considered unrealistic for many real-world
tasks since relative assessments require human input, which is often costly or
difficult to obtain. Because of this, only a limited number of these
comparisons may be provided. In this work, we explore methods for aiding the
process of learning a kernel with the help of auxiliary kernels built from more
easily extractable information regarding the relationships among objects. We
propose a new kernel learning approach in which the target kernel is defined as
a conic combination of auxiliary kernels and a kernel whose elements are
learned directly. We formulate a convex optimization to solve for this target
kernel that adds only minor overhead to methods that use no auxiliary
information. Empirical results show that in the presence of few training
relative comparisons, our method can learn kernels that generalize to more
out-of-sample comparisons than methods that do not utilize auxiliary
information, as well as similar methods that learn metrics over objects
A Unifying View of Multiple Kernel Learning
Recent research on multiple kernel learning has lead to a number of
approaches for combining kernels in regularized risk minimization. The proposed
approaches include different formulations of objectives and varying
regularization strategies. In this paper we present a unifying general
optimization criterion for multiple kernel learning and show how existing
formulations are subsumed as special cases. We also derive the criterion's dual
representation, which is suitable for general smooth optimization algorithms.
Finally, we evaluate multiple kernel learning in this framework analytically
using a Rademacher complexity bound on the generalization error and empirically
in a set of experiments
Large-scale Nonlinear Variable Selection via Kernel Random Features
We propose a new method for input variable selection in nonlinear regression.
The method is embedded into a kernel regression machine that can model general
nonlinear functions, not being a priori limited to additive models. This is the
first kernel-based variable selection method applicable to large datasets. It
sidesteps the typical poor scaling properties of kernel methods by mapping the
inputs into a relatively low-dimensional space of random features. The
algorithm discovers the variables relevant for the regression task together
with learning the prediction model through learning the appropriate nonlinear
random feature maps. We demonstrate the outstanding performance of our method
on a set of large-scale synthetic and real datasets.Comment: Final version for proceedings of ECML/PKDD 201
Analyzing sensory data using non-linear preference learning with feature subset selection
15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004The quality of food can be assessed from different points of view. In this paper, we deal with those aspects that can be appreciated through sensory impressions. When we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers’ ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we postulate to learn from consumers’ preference judgments instead of using an approach based on regression. This requires the use of special purpose kernels and feature subset selection methods. We illustrate the benefits of our approach in two families of real-world data base
ABCD Neurocognitive Prediction Challenge 2019: Predicting individual fluid intelligence scores from structural MRI using probabilistic segmentation and kernel ridge regression
We applied several regression and deep learning methods to predict fluid
intelligence scores from T1-weighted MRI scans as part of the ABCD
Neurocognitive Prediction Challenge (ABCD-NP-Challenge) 2019. We used voxel
intensities and probabilistic tissue-type labels derived from these as features
to train the models. The best predictive performance (lowest mean-squared
error) came from Kernel Ridge Regression (KRR; ), which produced a
mean-squared error of 69.7204 on the validation set and 92.1298 on the test
set. This placed our group in the fifth position on the validation leader board
and first place on the final (test) leader board.Comment: Winning entry in the ABCD Neurocognitive Prediction Challenge at
MICCAI 2019. 7 pages plus references, 3 figures, 1 tabl
Feature selection for chemical sensor arrays using mutual information
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays
Evolving training sets for improved transfer learning in brain computer interfaces
A new proof-of-concept method for optimising the performance of Brain Computer Interfaces (BCI) while minimising the quantity of required training data is introduced. This is achieved by using an evolutionary approach to rearrange the distribution of training instances, prior to the construction of an Ensemble Learning Generic Information (ELGI) model. The training data from a population was optimised to emphasise generality of the models derived from it, prior to a re-combination with participant-specific data via the ELGI approach, and training of classifiers. Evidence is given to support the adoption of this approach in the more difficult BCI conditions: smaller training sets, and those suffering from temporal drift. This paper serves as a case study to lay the groundwork for further exploration of this approach
- …